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Abstract. A key goal of radio and 7— ray observations of active galactic nuclei is to characterize their time variability 
in order to elucidate physical processes responsible for the radiation. I describe algorithms for relevant time series 
analysis tools - correlation functions, Fourier and wavelet amplitude and phase spectra, structure functions, and 
time-frequency distributions, all for arbitrary data modes and sampling schemes. For example radio measurements 
can be cross-analyzed with data streams consisting of time-tagged gamma-ray photons. Underlying these methods 
is the Bayesian block scheme, useful in its own right to characterize local structure in the light curves, and also 
prepare raw data for input to the other analysis algorithms. One goal of this presentation is to stimulate discussion 
of these methods during the workshop. 



1. Introduction 

Active galactic nuclei (AGN) are highly variable at all 
wavelengths. A major fraction of their total luminosity 
fluctuates over time scales ranging from the shortest for 
which statistically significant signals can be obtained, to 
the longest time intervals over which data are available. 
Characterizing this variability has yielded growing insight 
into the physical processes powering the large AGN lumi- 
nosities - a trend that will accelerate as observations, data 
analysis, and theory proliferate. 

This paper outlines time series methods for analysis of 
the disparate data modes of radio, 7— ray, and other astro- 
nomical observations. The next section introduces a data 
structure that generalizes data modes traditionally used 
for time-sequential observations. This abstraction yields 
methods for estimating, from arbitrary time series data, 
including heterogeneous mixtures of data modes, all of the 
standard analysis functions: 

• light curves 

• autocorrelations 

• Fourier power and phase spectra 

• wavelet representations 

• structure functions 

• time-frequency distributions 

As indicated in Table 1, for essentially arbitrary data 
modes these methods yield amplitude and phase infor- 
mation for single or multiple time series (auto- and cross- 
analysis, respectively) - if desired, conditional on auxiliary 
variables. 

There are significant difficulties in the astrophysical 
interpretation of these quantities. The methods described 
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Table 1. Time Series Analysis for All Data Modes 
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here are of use in some of these, such as separation of ob- 
servational errors from stochastic source variability (both 
of which, unfortunately, are often called noise). But I do 
not discuss other more difficult problems, which are proba- 
bly beyond the scope of time series analysis methods, such 
as assessing the importance of cosmic variance, identify- 
ing causal or otherwise physically connected relationships 
in multi-wavelength time series data, etc. 

Subsequent sections discuss each of the above-listed 
functions and give sample applications. 

2. Abstract Data Cells 

The time series algorithms to be described below can be 
applied to almost any type of time-sequential astronomi- 
cal data. This generality is facilitated by identifying those 
features of the data modes that are necessary for analysis 
algorithms. 

Each individual act of measurement may yield a large 
set of data values relevant to estimation of the signal am- 
plitude, and its uncertainty, as a function of time. Of these, 
two pieces of information, related to the independent vari- 
able (tim43 of the measurement) and the dependent vari- 
able (amplitude of the signal at that time), are necessary 



1 In practice we always use a discrete time representation, 
such as a micro-second scale computer clock tick, or the finite 
time interval of signal averaging. 
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for any time series algorithm. In radio astronomy the typ- 
ical example is the measurement of the flux of a source 
averaged over a short interval of time. In 7-ray astronomy 
the typical example is the detection of individual photons. 
The arrival time of the photon is obviously the timing 
quantity, but what about the signal? One scheme is to 
represent an individual photon with a delta-function in 
time. But more information can be extracted by incorpo- 
rating the time interval^) between photons. Specifically, 
for each photon consider the interval starting half way 
back to the previous photon and ending half way forward 
to the subsequent photon. This interval, namely 



,t, 



■t, 



' 2 ■ 2 " (1) 

is the set of times closer to t n than to any other time|f| 
and has length equal to the average of the two intervals 
connected by photon n, namely 



At, 



<-n+l 



Then the reciprocal 
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is taken as an estimate of the signal amplitude correspond- 
ing to observation n. When the photon rate is large, the 
corresponding intervals are small. Figure 1 demonstrates 
the data cell concept, including the simple modifications 
to account for variable exposure and for weighting by pho- 
ton energy. 

Consider gaps in the data. By this we mean that there 
are portions of the total observation interval during which 
the detection system is completely off (exposure zero). 
This situation is readily handled by defining the start of 
the data cell for the first photon detected after the gap 
at the end time of the gap. Correspondingly the data cell 
for the last photon before the gap is set at the start time 
of the gap. The statistical nature of independent events 
assures that this procedure rigorously estimates the true 
photon rate at the edge of the gap. Of course, no informa- 
tion is available about the signal during such a gap, and 
the various algorithms deal with gaps accordingly. 

Now we consider data modes generally. Three com- 
mon examples are: (a) measurements of a quasi-continuous 
physical variable (eg. radio astronomy flux measurements) 
(b) the time of occurrence of discrete events (e.g. photons) 
and (c) counts of events in bins. The signal of interest is 
the time dependence of the measured quantity in case (a) , 
or of the event rate in case (b). Case (c) is actually very 
similar to (b) , but is often described as density estimation 



2 A method for analyzing event data based solely on inter- 
event time intervals has been developed in (|Prahl 1996[) . 

3 These intervals form the Voronoi tessel- 
lation of the total observation interval. See 
(Okabc, Boots, Sugihara and Chiu 2000) for a full dis- 
cussion of this construct, highly useful in spatial domains of 2, 
3, or higher dimension; see also (Scargle 2001a, Scargle 2001c). 
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Fig. 1. Voronoi cell of a photon. Three successive photon 
detection times are circles on the time axis. The vertical 
dotted lines underneath delineate the time cell, and the 
light rectangle is the local estimate of the signal ampli- 
tude. If the exposure at this time is less than unity, the 
length of the data cell shrinks in proportion (dt — > dt'), 
yielding a larger estimate of the true event rate (darker 
rectangle). The height of the rectangle is n/dt', where n is 
the number of photons at exactly the same time (almost 
always 1), or by the photon energy for a flux estimate. 

or determination of the event distribution function. In all 
cases it is useful to introduce the concept of cells to rep- 
resent the measurements. Letting x n be the estimate of 
the signal amplitude for a cell at time t n , a data set of N 
sequential observations is denoted 



C„ = {x„,£„}, n = 1,2,..., N. 



(4) 



The specific meaning of the quantities x„ depends on the 
type of data. For example, in the three cases mentioned 
above the array x contains (a) the sum or average over the 
measurement interval of an extensive or intensive quantity, 
respectively, plus one or more quantifiers of measurement 
uncertainty, (b) coordinates of events, such as photon ar- 
rival times, and (c) sizes and locations of the bins, and the 
count of events in them. 

A major reason for constructing this abstract data rep- 
resentation is that it unifies all data modes into a common 
format that makes construction of universal algorithms 
easy. As we will see in the next sections, even mixtures 
of data types - either in the sense of cross-analyzing two 
very different data types, or mixing data within a single 
time series - can be handled. 

3. Light Curve Analysis: Bayesian Blocks 

The simplest and most direct way to study variabil- 
ity is to construct a representation of the intensity of 
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the source as a function of time. More can be done 
than just plotting the intensity measurements as a func- 
tion of the time of the measurement. Smoothing, in- 
terpolation, gap filling, etc. are all techniques meant 
to enhance one's understanding of the variability. Here 
we discuss a different procedure, namely construction 
of a simple, generic, non-parametric model of the data 
that as much as possible shows the actual variability of 
the source, and minimizes the effect of observation er- 
rors. The model adopted is the simplest possible non- 
parametric representation of time series data, namely a 
piece-wise constant model. Details of this approach are 
given in QScargle, Norris, Jackson and Chiang 2010| ); the 
improved algorithm given there replaces the approximate 
one described in ( jScargle 1 998). 

The Bayesian Blocks algorithm finds the best parti- 
tion of the observation interval into blocks, such that 
the source intensity is modeled as varying from block 
to block, but constant within each block. This is just 
a step-function representation of the data. The mean- 
ing of the "best" model is the one that maximizes a 
measure of goodness-of-fit function described in detail 
in ( Scargle, Norris, Jackson and Chiang 2010 1 . Another 



change since the earlier reference is the use of a very sim- 
ple maximum likelihood fitness function, preferable to the 
Bayesian posterior previously used because it is invariant 
to a scale change in the time variable, thus eliminating a 
parameter from the analysis. 

Figure 2 shows the Bayesian blocks analysis of two 
AGN in the OVRO /Fermi joint program. The data shown 
are from somewhat earlier in the program, where the over- 
lap between the to instruments was not huge. Also these 
were just the first and third objects in the long list of 
observed sources, and were not particularly selected for 
being highly variable cases. 

4. Correlation Functions 

A rather underutilized technique for studying correlated 
variability of two observables (such as time series for dif- 
ferent wavelengths) is to construct a scatter plot of one 
against the other. If done carefully, this approach allows 
study of joint probability distributions for the two vari- 
ables; these contain more statistical information than cor- 
relation functions or any of the other functions discussed 
here. The challenges of this approach include the difficulty 
of depicting the all-important time-sequence connecting 
the points in the scatter plot, and the need to consider 
plotting lagged versions of the variables, for a number of 
values of the lag. The understanding that comes from care- 
ful study of scatter plots most often makes it worthwhile 
to conquer these difficulties. 

But probably the most used tool for studying statis- 
tical variability properties of a single time series is the 
auto-correlation function (ACF) or, for studying relations 
between the variability in two or more sets of simultane- 
ous time series, the cross-correlation function (CCF). The 
meaning of the latter can be understood by modeling one 
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Fig. 2. Bayesian Block representations of the lightcurves 
of 3C273 and 3C279, two AGN in the OVRO/Fermi 
project. The co-aligned times are in days, relative to an 
arbitrary zero point; amplitudes are on a common relative 
scale. Binned LAT data is shown for comparison, but the 
BB representation is based on the photon data only. 



Fig. 3. Summation schemes for autocorrelation functions. 
The points represent data cells, derived from measured 
values (as in radio astronomy) or time-tagged events (as 
in Fermi photon data). Top: Summation over data with 
arbitrary spacing in the Edelson and Krolik algorithm. 
From each point average over all points within a bin dr 
distant by t; t is binned, but t is not. Bottom: Standard 
lag summation over evenly spaced data. From each point 
(except near the ends) there is another point distant by 
exactly r = an integer multiple of At. 
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time series as a lagged version of the other, and evaluating 
the posterior distribution of the lag r, yielding 



P(r) 



(5) 



where K is a constant and Rx,y(t) is the cross-correlation 



function defined below ( Scargle 2001b ). 

Concentrating on the CCF, of which the ACF is really 
a special case, and following the notation and definitions 
of dPapoulis 1965[ |Papoulis 1977[ ), we have this definition 
of the cross- correlation function of two real processes x(i) 
and y(t) 



Rxy{tuh) =< { x(ti)y(i 2 ) } > 



(6) 



Assuming the processes are stationary, the time depen- 
dence is on only the difference r = £2 — ti and we have 



R xy {r) =< { x(t)y(t + r) } > 



(7) 



The symbol <> means the expected value, informally to 
be thought of as an average over realizations of the under- 
lying random process X. In data analysis this theoretical 
quantity is typically not known, and must be therefore be 
estimated from the data at hand, e.g. 



E[X(t)Y(t + T)} = 



1 



■E 



(8) 



where x n and y n are the samples of the variable X, 
and N(t) is the number of terms for which the sum can 
be taken. 

Figure 3 is a cartoon of the lag relationships for cor- 
relation functions of evenly spaced data (bottom) , as well 
as a solution to the difficulty posed by unevenly spaced 
time samples in general, and event data in particular. For 
a given sample or event at t n there will in general not be 
a corresponding one at t n + r, no matter what restriction 
is placed on r. 

For this problem an ingenious if straightforward algo- 
rithm (Edels on and Krolik 1988]) is in wide use. The basic 
idea is to pre-define a set of bins in the variable r in or- 
der to construct a histogram of the corresponding time 
separations r = t m — t„, weighted by the corresponding 
XnUm product. To be more specific, and modifying slightly 
Edelson and Krolik's formulas for our case (including not 
subtracting the process means), define for all measured 
pairs (x n ,y m ) the quantity 



UDFC nm = 



(9) 



4 Caution: It is common to center the processes about their 
means, to yield the cross-covariance and auto-covaraince func- 
tions. Such mean-removal can have unfortunate consequences, 
such as distortion of the low-frequency power spectrum. In ad- 
dition, the nomenclature is not completely standard. Various 
terms are used for the cases where the means of the processes 
have been subtracted off, and/or the resulting function nor- 
malized to unity at r = 0. 





Fig. 4. Autocorrelation functions for the same two AGN 
as in Figure 2 for radio and 7-ray data. Solid line with 
dark error band: OVRO 15 GhZ; Dotted line with light 
error band: Fermi LAT. 

(for Unbinned Discrete Correlation Function) where a x is 
the standard deviation of the AT-observations, e x is the 
Admeasurement error, and similarly for Y. The estimate 
of the correlation function is then 
1 



Rx V {t) 



UDCF n 



(10) 



where the sum is over the pairs, N T in number, for which 
t m — t n lies in the corresponding r-bin. 

There has been some confusion over the rationale for 
the denominator in eq. © (" ... to preserve the proper 
normalization") and how to estimate it. The quantity 
{a\ — e x ) is in principle the difference between the to- 
tal observed variance and that ascribed to observational 
errors. How they are estimated from source and calibra- 
tion data, and other instrumental considerations, no doubt 
varies from case to case. Edelson and Krolik discuss poten- 
tial corruption by correlated observational errors. I recom- 
mend following their advice to exclude the terms n = m 
from eq. (|10D only for autocorrelations, and then only if it 
is really necessary. These terms yield a spike in the auto- 
correlation function at r = 0, which can be a convenient 
visual assessment of the importance of the observational 
variance; it can be easily removed if needed. For CCFs 
it makes no sense to remove these terms, absent observa- 
tional errors correlated between the two observables. 

Auto- and cross- correlation involving photon event 
data is a simple matter of inserting the quantity in eq. ([3]) 
into eq. Since essentially any time series data mode 
yields at least surrogates for t n and x n , the same is true 
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t (days) 

Fig. 5. Cross-correlation functions for the same two AGN 
as in Figure 2, for radio and 7-ray data. 

in general. Figure 4 shows autocorrelation functions com- 
puted in this way, for the same AGNs shown in Figure 
2 and Figure 5 shows the corresponding cross-correlation 
functions. 



OVRO (dashed) vs. Fermi (solid) dot = 3c273 o = 3c279 




log 2 scale (days) 

Fig. 6. Wavelet Power (scalegrams) for OVRO and LAT 
data on 3c273 and 3c279, with the Haar Wavelet, logio of 
the power plotted against logi of time scale in days. 

in these spectral representations, but the rough power law 
characteristic of j processes can be seen, as well as the 
noise floor for the LAT data. 



5. Fourier Power and Phase Spectra 

Perhaps the most used time series analysis technique in 
astronomy is estimation of the Fourier power spectrum, 
mainly with the goal of detecting and then characterizing 
periodic signals hidden in noisy data, but also for ana- 
lyzing non-periodic signals such as quasi periodic oscilla- 
tions and colored, or y," noise. There are methods for di- 



rect estimation of Fourier power ( Scargle 1982 1 and phase 
( jScargle 1989[ ) spectra from time series data. However, it 
is often more convenient to make use of the well-known 
result that the power spectrum is the Fourier transform 
of the ACF computed as described above in 2) The slid- 
ing window power spectra depicted in ij8] were computed 
in this way. 

6. Wavelet Representations 

It is relatively straightforward to compute the wavelet 
transform for any time series that can be put into the stan- 
dard data cell representation. The wavelet shape (in this 
case the piecewise constant Haar wavelet) is integrated 
against the empirical signal amplitude assigned by the 
data cells. Figure 6 shows the scalegrams, or wavlet power 
spectra (Scargle et al. 1993]), for the same AGN data as in 
Figure 4. There is not enough data to yield much detail 



7. Structure Functions 

Another concept in wide usage is the structure 
function. For the most part its auto- and cross- 
versions are a repackaging of the same infor- 
mation contained in the corresponding correla- 
tions. This point has recently been emphasized by 



( Emmanoulopoulos, McHardy and Uttley 2010 ). In addi- 
tion to summarizing some of the caveats and problems 
associated with structure functions, these authors give a 
formal proof of the exact relation between structure func- 
tions and the corresponding auto- and cross-correlation 
functions. In addition, the literature contains a number 
of claims for the superiority of the structure function that 
seem unwarranted, especially in view of the relation just 
mentioned. An example is the misconception that struc- 
ture functions are somehow immune from sampling effects, 
including aliasing. Finally, some analysts believe that at 
short timescales the structure function always becomes 
flat; the actual generic behavior can be derived from eq. 
(A10) of dEmmanoulopoulos, McHardy and Uttley 2010] ); 
the normalized structure function satisfies 



NSF(t) = 2[1 - ACF(t)} -> Ct 2 



(11) 



for t — y 0, since autocorrelation functions are even in r. 
In practice this dependence may seem flat compared to 



(i 



Jeffrey D. Scargle: Cross-Analyzing Radio and 7-Ray Time Series Data: Fermi Marries Jansky 



steeper behavior at intermediate time scales, transitioning 
to the typical asymptotic loss of correlation at large time 
scale expressed as NSF(t) — > 2, correctly assessed as flat. 

A few other points perhaps favor the use of struc- 
ture functions (beyond the fact that they have been 
widely used in the past, and therefore arguably should 
be computed if only for comparison with previous work). 
When the structure and correlation functions are esti- 
mated from actual data, this equivalence result quoted 
above does not hold exactly. There can in fact be signifi- 
cant departures from the theoretical relations in Appendix 
A of ( Emmanoulopoulos, M cHardy and Uttley 2010[ ), due 
to end effects always present for finite data streams. In ad- 
dition, when measuring slope of powerlaw relationships it 
can be slightly more convenient to fit polynomials to the 
typical shape of a structure function than to the corre- 
sponding correlation function or power spectrum. 
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8. Time-Frequency Distributions 

The term time-frequency distribution refers to techniques 
for studying the time-evolution of the power spectrum of 
time series. This concept must deal with the fact that the 
spectrum is a property of the entire time interval, so that 
estimating it locally in time results in the need for trad- 
ing off time resolution against frequency resolution. See 
(jFlandrin 1 999) for a complete exposition of these issues. 

There are many algorithms for computing time- 
frequency distributions, but little has been done for the 
case of event data, one exception being the approach de- 



scribed in (Galleani, Cohen, Nelson, and Scargle (2001)). 
Although there are advanced techniques based on the 
Wigner-Ville distribution, Cohen's class of distribution, 
and others, in many applications the sliding window power 
spectrum is of considerable use. The idea is simple: com- 
pute the power spectrum of a subsample of the data within 
a restricted time-interval, small compared to the total in- 
terval. Information on the time dependence results from 
the fact that the window is slid along the observation in- 
terval. Information on frequency dependence is contained 
in the power spectrum. The tradeoff of time- and fre- 
quency resolution is mediated by the length of the time 
window: a short window yields high time resolution and 
low spectral resolution, and vice versa for a long win- 
dow. Implementation of this approach is straightforward 
through use of the techniques in ig] and 

Figure 7 shows sliding window power spectra com- 
puted, in this way, from time series data on four AGN 
provided by other authors at this workshop. These time- 
frequency distributions can show spectral details that are 
washed out in a power spectrum of the whole interval. 
In these cases there is little evidence for periodicities of 
any kind. Note that these are preliminary results, with no 
attempt being made to adjust the size of the window. 



Fig. 8. Sampling histograms: the distributions of the time 
intervals between the samples for the data in Figure 7. 



9. Conclusions 

Rather than regurgitating the discussion above, I end with 
a few practical suggestions. They may seem obvious or 
trivial, but I have found them surprisingly useful in prac- 
tice. 

When addressing time series data in the form of eq. 
(U]), the first step should be to study the time intervals 
t n+ i —t n ; in particular compute, plot, and study the their 
distribution with suitably constructed histograms. (Even 
if the provider of the data swears the times are evenly 
spaced, check it!) This often reveals many defects in the 
data, such as duplicate entries and observations out of 
order. The outliers of the distribution signal peculiari- 
ties, perhaps expected (such as known sampling irregu- 
larities, regularities, or semi-regularities) but often unex- 
pected surprises. Figure 8 shows examples from the data 
for which time-frequency distributions were shown above. 
The reader is invited to see what conclusions can be de- 
duced from these distributions. 

Don't subtract the mean value! Or at least do so with 
attention to its effects. Too often time series data are de- 
trended without careful consideration of the resulting ef- 
fects on the estimated functions. Mean removal is a special 
case of detrending. 

While there are some cases where the distinction be- 
tween stationary and non-stationary processes is impor- 
tant, with limited data it is difficult or impossible to make 
this distinction in practice. For different reasons, the dis- 
tinction between linearity and non-linearity is best left to 
the realm of physical models rather than data analysis. 
Linearity is a property of physical processes, and mathe- 
matical definitions QPriestly 19881 |Tong 19901 ) may or may 
not connect meaningfully to physical concepts. 
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Fig. 7. Time-frequency distribution for 4 AGN data sets: 3c 120 x-ray data from Chatterjee et al. (2009, ApJ, 704, 
1689) provided by Alan Marscher; optical, R magnitude data on OJ 278, by Villforth C, Nilsson K., Heidt J., et al., 
2010, MNRAS, 402, 2087, provided by Ivan Agudo, 37 GhZ observations of 3c 454 and 3c 279 from the Metsahovi 
Radio Observatory, provided by Anne Lahteenmaki. 



Finally, in thinking about AGN variability in general 
it is useful to think in terms of the mathematical concept 
of doubly stochastic (or Cox) processes. Essentially, this 
is a picture in which there are two distinct random pro- 
cesses: the intrinsic variability of the source (truly random, 
periodic, quasi-periodic, etc.) and the observation process. 
The latter is random due to observational errors from pho- 
ton counting, detector noise, background variability, etc. 
It is a major data analysis challenge to cleanly separate 



out the observational process to reveal the true variability 
of the astronomical source. 
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